54 research outputs found
On information plus noise kernel random matrices
Kernel random matrices have attracted a lot of interest in recent years, from
both practical and theoretical standpoints. Most of the theoretical work so far
has focused on the case were the data is sampled from a low-dimensional
structure. Very recently, the first results concerning kernel random matrices
with high-dimensional input data were obtained, in a setting where the data was
sampled from a genuinely high-dimensional structure---similar to standard
assumptions in random matrix theory. In this paper, we consider the case where
the data is of the type "informationnoise." In other words, each
observation is the sum of two independent elements: one sampled from a
"low-dimensional" structure, the signal part of the data, the other being
high-dimensional noise, normalized to not overwhelm but still affect the
signal. We consider two types of noise, spherical and elliptical. In the
spherical setting, we show that the spectral properties of kernel random
matrices can be understood from a new kernel matrix, computed only from the
signal part of the data, but using (in general) a slightly different kernel.
The Gaussian kernel has some special properties in this setting. The elliptical
setting, which is important from a robustness standpoint, is less prone to easy
interpretation.Comment: Published in at http://dx.doi.org/10.1214/10-AOS801 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Tracy--Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices
We consider the asymptotic fluctuation behavior of the largest eigenvalue of
certain sample covariance matrices in the asymptotic regime where both
dimensions of the corresponding data matrix go to infinity. More precisely, let
be an matrix, and let its rows be i.i.d. complex normal vectors
with mean 0 and covariance . We show that for a large class of
covariance matrices , the largest eigenvalue of is
asymptotically distributed (after recentering and rescaling) as the
Tracy--Widom distribution that appears in the study of the Gaussian unitary
ensemble. We give explicit formulas for the centering and scaling sequences
that are easy to implement and involve only the spectral distribution of the
population covariance, and . The main theorem applies to a number of
covariance models found in applications. For example, well-behaved Toeplitz
matrices as well as covariance matrices whose spectral distribution is a sum of
atoms (under some conditions on the mass of the atoms) are among the models the
theorem can handle. Generalizations of the theorem to certain spiked versions
of our models and a.s. results about the largest eigenvalue are given. We also
discuss a simple corollary that does not require normality of the entries of
the data matrix and some consequences for applications in multivariate
statistics.Comment: Published at http://dx.doi.org/10.1214/009117906000000917 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond
We place ourselves in the setting of high-dimensional statistical inference,
where the number of variables in a data set of interest is of the same
order of magnitude as the number of observations . More formally, we study
the asymptotic properties of correlation and covariance matrices, in the
setting where for general population covariance. We
show that, for a large class of models studied in random matrix theory,
spectral properties of large-dimensional correlation matrices are similar to
those of large-dimensional covarance matrices. We also derive a
Mar\u{c}enko--Pastur-type system of equations for the limiting spectral
distribution of covariance matrices computed from data with elliptical
distributions and generalizations of this family. The motivation for this study
comes partly from the possible relevance of such distributional assumptions to
problems in econometrics and portfolio optimization, as well as robustness
questions for certain classical random matrix results. A mathematical theme of
the paper is the important use we make of concentration inequalities.Comment: Published in at http://dx.doi.org/10.1214/08-AAP548 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The spectrum of kernel random matrices
We place ourselves in the setting of high-dimensional statistical inference
where the number of variables in a dataset of interest is of the same order
of magnitude as the number of observations . We consider the spectrum of
certain kernel random matrices, in particular matrices whose
th entry is or where is
the dimension of the data, and are independent data vectors. Here is
assumed to be a locally smooth function. The study is motivated by questions
arising in statistics and computer science where these matrices are used to
perform, among other things, nonlinear versions of principal component
analysis. Surprisingly, we show that in high-dimensions, and for the models we
analyze, the problem becomes essentially linear--which is at odds with
heuristics sometimes used to justify the usage of these methods. The analysis
also highlights certain peculiarities of models widely studied in random matrix
theory and raises some questions about their relevance as tools to model
high-dimensional data encountered in practice.Comment: Published in at http://dx.doi.org/10.1214/08-AOS648 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A rate of convergence result for the largest eigenvalue of complex white Wishart matrices
It has been recently shown that if is an matrix whose entries
are i.i.d. standard complex Gaussian and is the largest eigenvalue of
, there exist sequences and such that
converges in distribution to , the Tracy--Widom
law appearing in the study of the Gaussian unitary ensemble. This probability
law has a density which is known and computable. The cumulative distribution
function of is denoted . In this paper we show that, under the
assumption that , we can find a function ,
continuous and nonincreasing, and sequences and
such that, for all real , there exists an integer
for which, if , we have, with
, The
surprisingly good 2/3 rate and qualitative properties of the bounding function
help explain the fact that the limiting distribution is a good
approximation to the empirical distribution of in simulations, an
important fact from the point of view of (e.g., statistical) applications.Comment: Published at http://dx.doi.org/10.1214/009117906000000502 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Second order accurate distributed eigenvector computation for extremely large matrices
We propose a second-order accurate method to estimate the eigenvectors of
extremely large matrices thereby addressing a problem of relevance to
statisticians working in the analysis of very large datasets. More
specifically, we show that averaging eigenvectors of randomly subsampled
matrices efficiently approximates the true eigenvectors of the original matrix
under certain conditions on the incoherence of the spectral decomposition. This
incoherence assumption is typically milder than those made in matrix completion
and allows eigenvectors to be sparse. We discuss applications to spectral
methods in dimensionality reduction and information retrieval.Comment: Complete proofs are included on averaging performanc
Operator norm consistent estimation of large-dimensional sparse covariance matrices
Estimating covariance matrices is a problem of fundamental importance in
multivariate statistics. In practice it is increasingly frequent to work with
data matrices of dimension , where and are both large.
Results from random matrix theory show very clearly that in this setting,
standard estimators like the sample covariance matrix perform in general very
poorly. In this "large , large " setting, it is sometimes the case that
practitioners are willing to assume that many elements of the population
covariance matrix are equal to 0, and hence this matrix is sparse. We develop
an estimator to handle this situation. The estimator is shown to be consistent
in operator norm, when, for instance, we have as . In
other words the largest singular value of the difference between the estimator
and the population covariance matrix goes to zero. This implies consistency of
all the eigenvalues and consistency of eigenspaces associated to isolated
eigenvalues. We also propose a notion of sparsity for matrices, that is,
"compatible" with spectral analysis and is independent of the ordering of the
variables.Comment: Published in at http://dx.doi.org/10.1214/07-AOS559 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- β¦